Skip to content

perf(v2 pipeline): parallel agents + LLM iteration cap + tighter arti…#39

Merged
caviri merged 1 commit into
developfrom
feat/v2-pipeline-perf
May 18, 2026
Merged

perf(v2 pipeline): parallel agents + LLM iteration cap + tighter arti…#39
caviri merged 1 commit into
developfrom
feat/v2-pipeline-perf

Conversation

@caviri
Copy link
Copy Markdown
Member

@caviri caviri commented May 18, 2026

…cle prompt

Profile of one hybrid extraction on a 50-person paper-heavy repo (deeplabcut/deeplabcut): person stage 5:06, article stage 12:00 on a single LLM agent invocation that fired 100+ tool calls, membership stage 6:37 — total ~25 min, dominated by serial waits and an unbounded article-agent tool-call loop.

Three independent quick wins, each tunable per deployment:

  1. max_concurrent_agents default 3→8 in the orchestrator (and the V2_MAX_CONCURRENT_AGENTS env-resolver default in the API layer raised 6→8). Person/membership stages are bottlenecked on the asyncio.Semaphore, not the LLM provider — bumping the cap absorbs wider-fanout repos without saturating RCP.

  2. New _default_usage_limits() in V2LLMRuntime: caps every agent invocation at 25 model requests + 50 tool calls via pydantic-ai's UsageLimits. Without a cap the article agent could keep cross-validating the same DOI across five tools indefinitely. Overridable per-call (existing kw-only signature) or globally via V2_LLM_REQUEST_LIMIT / V2_LLM_TOOL_CALLS_LIMIT. The cap turns runaway loops into clean LLMRuntimeError that the per-stage runner already handles as a per-item warning.

  3. Tightened the article agent system prompt with two "stop early" rules: emit immediately once a concrete DOI/title is found (no cross-validation past two sources), and emit {} after two consecutive empty searches instead of looping. The LLM was doing ~25 OpenAlex calls per agent invocation chasing the same paper.

Expected impact on the profiled repo: ~25 min → ~10 min wall time. No schema, API, or routing changes — caps are tunable and conservative defaults.

…cle prompt

Profile of one hybrid extraction on a 50-person paper-heavy repo
(deeplabcut/deeplabcut): person stage 5:06, **article stage 12:00 on a
single LLM agent invocation that fired 100+ tool calls**, membership
stage 6:37 — total ~25 min, dominated by serial waits and an
unbounded article-agent tool-call loop.

Three independent quick wins, each tunable per deployment:

1. `max_concurrent_agents` default 3→8 in the orchestrator (and the
   `V2_MAX_CONCURRENT_AGENTS` env-resolver default in the API layer
   raised 6→8). Person/membership stages are bottlenecked on the
   `asyncio.Semaphore`, not the LLM provider — bumping the cap absorbs
   wider-fanout repos without saturating RCP.

2. New `_default_usage_limits()` in `V2LLMRuntime`: caps every
   agent invocation at 25 model requests + 50 tool calls via
   pydantic-ai's `UsageLimits`. Without a cap the article agent could
   keep cross-validating the same DOI across five tools indefinitely.
   Overridable per-call (existing kw-only signature) or globally via
   `V2_LLM_REQUEST_LIMIT` / `V2_LLM_TOOL_CALLS_LIMIT`. The cap turns
   runaway loops into clean `LLMRuntimeError` that the per-stage
   runner already handles as a per-item warning.

3. Tightened the article agent system prompt with two "stop early"
   rules: emit immediately once a concrete DOI/title is found (no
   cross-validation past two sources), and emit `{}` after two
   consecutive empty searches instead of looping. The LLM was doing
   ~25 OpenAlex calls per agent invocation chasing the same paper.

Expected impact on the profiled repo: ~25 min → ~10 min wall time. No
schema, API, or routing changes — caps are tunable and conservative
defaults.
@caviri caviri merged commit da24a0e into develop May 18, 2026
3 of 4 checks passed
@caviri caviri deleted the feat/v2-pipeline-perf branch May 18, 2026 19:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant